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Control of extraneous variables is a fundamental condition to causal interpretations of 
research (Johnson, 2001). Randomization of participants to treatment conditions has 
long been considered a powerful method of control, so much so that this is the 
distinguishing characteristic between true experimental and other types of research 
(Campbell & Stanley, 1963). When a researcher uses randomization, it is clear that the 
basis upon which participants receive treatment conditions is unrelated except by 
chance to any variable that can be confounded with the treatments. 
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A great deal of research is done in field settings in education. State-level or 
district-based researchers, for example, are often interested in practical interventions 
that can occur naturally in schools. However, randomization is typically unavailable to 
those who work in field settings because the investigator is not able to manipulate 
treatment conditions at the level of the individual participant. This often arises because 
institutions such as schools are reluctant to move participants (e.g., students) from 
group to group (e.g., class to class) or otherwise assign them to groups according to 
researcher needs. Similarly, it may not be possible even to determine randomly which 
group receives which treatment condition, that being decided through other means, 
such as teacher choice. 

Failing randomization, one approach used in the field is to measure extraneous 
variables and employ statistical control (e.g., analysis of covariance). Pedhazur (1997) 
describes three common contexts for statistical control with intact groups: attempting to 
equate them on the outcome variable(s) using one or more pretest(s), attempting to 
control for other variable(s) in looking at mean differences, and attempting to control for 
other variable(s) in looking at differences in regressions. He points out that these are 
usually invalid uses of analysis of covariance. 

Because statistical procedures are generally less effective than experimental control, 
theoretical inferences about relationships observed in field settings are often subject to 
multiple reasonable internal validity threats. And in many cases it is not even possible to 
measure extraneous variables effectively, such as when limited time is available, when 
the number of participants in the research is limited, or when the measurement is too 
intrusive. Johnson (2001) has recently concluded that there is little that can be gained 
from a single, non-experimental research study. A feasible alternative that can enhance 
the ability of field investigators to draw causal inferences in field settings clearly would 
be an advantage. 

In field contexts, there are typically many opportunities available to investigators that are 
not open to researchers in more controlled settings. Laboratory researchers commonly 
have small pools of potential participants to select from and may need to expend 
nontrivial resources to obtain their cooperation. On the other hand, in applied settings 
such as classrooms and schools, and especially for employees of the institution, 
students or other participants are often generously available as long as the intrusion of 
the research is minimal. Many investigators in the field thus have broad feasible 
research opportunities that laboratory researchers do not enjoy. It is therefore possible 
in common applied research settings to be able to repeat, or replicate, a study design 
more than once. 

It is argued here that careful planning of replications can enhance the interpretability of 
applied research. When results are consistent across several studies, there is a 
stronger basis for observed relationship(s) than the support that is available within each 
study by itself, since results that have been replicated are considered more likely to 
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generalize (continue to be observed). It is also possible to compare the studies with 
each other to identify constructs that interact with, or moderate, relationships. Although 
these advantages exist whether or not the research includes experimental control, the 
opportunity to replicate a basic study design in multiple field contexts is more likely to be 
available to the applied researcher and is a technique that can lead to stronger 
inferences in any setting. Thus, it is recommended that persons who conduct field 
research try to include replication as a fundamental feature in their studies. 

The analysis of the several studies" results should also be addressed. Meta-analysis is 
an attractive vehicle for combining, or synthesizing, a series of research replications. 
Although meta-analysis is generally thought of as a means for studying an existing 
research literature quantitatively, it also may be used to analyze a series of related 
studies generated within a single project. In the remainder of this article, pertinent 
features of meta-analysis are discussed briefly and then two examples are described in 
which multiple replications of a basic field design have been analyzed using 
meta-analysis to strengthen the evidence available. The basic designs differ markedly in 
the two examples. Finally, some design approaches for applied researchers thinking 
about using replications are discussed. 

META-ANALYSIS 



Meta-analysis is commonly used to synthesize the findings of multiple, but related, 
research studies. Those who are unfamiliar with meta-analysis can find a brief overview 
along with a completely analyzed example in Schafer (1999). More extensive 
discussions on a broad array of topics pertinent to meta-analysis are widely available in 
Hedges & Olkin (1985) and Cooper & Hedges (1994). 

Fundamental to meta-analysis is an effect-size measure calculated within a study. An 
effect-size measure may be used to compare two groups or to relate two variables. For 
example, the difference between two group means divided by the pooled standard 
deviation of the two groups in a study might be the effect-size measure [when adjusted 
for bias, this is Hedges & Olkin"s (1985) d index]. Another might be the correlation 
between two variables in a study. In general, to be used in a meta-analysis, an 
effect-size measure must be capable of transformation to a normally distributed statistic 
with a known variance. Under reasonable assumptions, both the examples here are 
appropriate. Techniques are described in the cited sources that allow a researcher to 
model the size of the effect (the effect-size index) as a result of study characteristics. 
That is, equations may be written, as in multiple regression, for relationships between 
study characteristics as predictors and an effect-size index as the criterion. These study 
characteristics may be descriptive of the participants, of the settings, of the treatment 
implementations, or of the outcome variables; in other words, virtually anything that can 
differentiate studies from each other can be used in the analysis as study 
characteristics. 

In one typical approach to meta-analysis, an effect-size index is calculated for each 
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study. The suitably weighted average of the effect sizes is tested against a null 
hypothesis of zero. Variation of the studies" effect sizes about the average is tested to 
determine whether it is at a greater-than-chance level and, if it is, then a study 
characteristic may be entered into a model (equation), so that effect size is then 
predicted as the sum of a constant (intercept) and a study characteristic scaled with 
(multiplied by) a slope estimate. The slope estimate is tested against the null hypothesis 
of zero. The variance of the effect sizes about the model (the residuals) is compared 
with the chance level. If homogeneity (chance-level variance) is achieved, modeling 
ceases; otherwise further study characteristics are added to the model. Of course, 
variations exist, some as solutions to special problems that may arise; only a very 
(over)simplified treatment is described here. 

DESCRIPTIVE GAINS FOR SCHOOLS 



A descriptive, or non-experimental, design is one in which there is no manipulation of 
treatments. The research problem studied in Guthrie, Schafer, Von Seeker, & Alban 
(2000) was the relationships between instructional characteristics of schools and the 
variation they showed in their degrees of gain or loss in student achievement over a 
year's time (growth). The effect-size index was the bias-corrected difference between 
school means at a target grade level between year one and year two on a statewide, 
standardized test, divided by the pooled standard deviation for the two years. The 
indexes were scaled so a positive difference showed improvement. The study was 
replicated in all six tested content areas at both tested grade levels in all 33 schools in 
three volunteer districts for a total of 396 effect sizes. 

The independent variables in the meta-analysis were school means for teacher-reports 
of emphasis devoted to different approaches in reading instruction. All teachers in each 
school were surveyed on a questionnaire with six sub-scales that had been developed 
through factor analysis using data from a fourth volunteer district in an earlier study. 

The meta-analyses were used to evaluate the association of the set of six instructional 
variables to achievement growth, of each variable individually to growth, and of each 
variable as a unique predictor of growth in a six-predictor model. The six content areas 
at each of the grade levels were analyzed separately. The results of the syntheses were 
interpretable and generally consistent with an extensive literature review for these 
variables. 

Although it is statistically possible to compare the two years of data for any one school, 
that single finding by itself would not have been remarkable. While the school might 
have developed instructional hypotheses for the direction and degree of growth 
observed, there would have been far too many plausible competing explanations for the 
difference, such as teacher turnover, test form calibrations, and student aptitude, for 
example. While the replicated study cannot entirely substitute for experimental control 
through randomization, the plausibility of at least some of the rival explanations is 
decreased if instructional explanations can be observed across replications, as they 
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were in this example study. Indeed, only by replicating the fundamental growth-study 
design was it possible to study the instructional characteristics of the schools as 
variables used to explain differences among gains across schools. 

DISCUSSION 



Consistent with Johnson's (2001) suggestions for strengthening interpretations of 
causality from non-experimental research, this article has recommended planning 
replications in field settings. The examples illustrate ways in which these replicated field 
designs can be synthesized to enhance the inferences that can be drawn from them. 
Further, when planned replications are used, it is possible also to plan for the 
measurement of variables that should prove useful to model effect sizes in a 
meta-analysis (e.g., the instructional variables in example 1). Fortunately for the 
researcher, a meta-analysis based on planned replications is far more straightforward to 
implement than a traditional synthesis of a disparate literature since fewer challenges, 
such as design differences, inadequate information, and inconsistent reporting of results 
across studies, exist. 

An investigator planning to use replications in field research must make several 
decisions. Some of these are discussed below. 

The basic design. The stronger the basic design, the stronger the inferences that may 
be made from any one replicate, and thus from the overall meta-analysis. The strongest 
feasible design should be chosen. Cook and Campbell (1979) provide an overview of 
designs that are particularly suitable in applied research contexts and discuss their 
strengths and weaknesses. It is important to be very clear what variable is independent 
and what is dependent in the basic design. In the two examples here, the independent 
variable was time (year 1 vs. year 2) in the first and presence or absence of the 
instruction workshop in the second. In both, the dependent variable was achievement. 
While year could not be manipulated in the first (the basic design was 
non-experimental), it was possible to manipulate the workshop in the second. Random 
assignment of instructors to workshop conditions strengthened that study [the basic 
design was pre-experimental (Campbell & Stanley, 1963)]. 

The effect-size measure. Magnitude of effect should be capable of coding as a 
standardized measure indicating direction and strength of relationship between the 
independent and dependent variables. Its quantification should yield an index that is 
normally distributed and has a known or estimate able variance. Rosenthal (1994) 
provides a menu of possibilities. Three common examples that differ depending on the 
scaling of the two variables are: both continuous (the correlation coefficient, r); both 
dichotomous (the log-odds ratio, L); or, as in the two examples here, the independent 
variable a dichotomy but the dependent variable continuous (bias-corrected d, 
discussed above). 

Maintaining effect-size independence. The effect sizes are assumed to be independent 
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in a meta-analysis. That is generally the case across studies, but is not always true 
within studies. In our two examples, each study produced several dependent effect-size 
indices. Dependencies created by the measurement of six content areas in each school 
were ignored in the first study by analyzing each grade level and content area 
separately; in the second study, the six tests were analyzed together at first and a 
Bonferroni-like correction was applied throughout the analyses (Gleser & Olkin, 1994). 
Of course, care should be taken in field studies that the sites at which the replications 
occur maintain separation; sharing of information by participants across replications can 
threaten effect-size independence. 

The variables to be measured. Besides the independent and dependent variables, it is 
advantageous to capitalize on the opportunity to measure variables that could be 
related to effect size (study characteristics). To generate a list of these, the researcher 
might consider how he or she might explain any observed differences that could appear 
among effect sizes across replicates. Whether substantive or artifactual, those 
explanations virtually always will be based on variables that should, if possible, be 
measured. These could be different contexts and dependent variables as in our second 
example in which effect sizes yielded by four different content areas and two test 
formats were combined into one meta-analysis. Or they may be descriptive of persons, 
such as demographics or aptitudes, or settings such as physical features in schools or 
classrooms. Coding characteristics of the replications that produced the different effect 
sizes provides data that are analyzed through relating these characteristics as 
independent variables to the effect sizes as dependent variables in the meta-analysis. 
The potential for assessing study differences that may be related to magnitude of effect 
represents an opportunity for creativity in designing robust multiple-study investigations 
through replication. Meta-analysis is a relatively new approach to data analysis and the 
field is changing rapidly. One recent advance has been development of effective 
methods to conduct random-effects model analyses. Hedges & Vivea (1998) present a 
straightforward and relatively simple modification that is consistent with the techniques 
used in the two examples cited here. They also provide a worked example. An 
advantage of using a random model is that the results generalize to a population of 
studies not included in the present analysis, whereas in the two examples described 
here, the conclusions were restricted to the specific replications themselves. Hedges & 
Vivea (1998) discuss the conditions under which each type of analysis, fixed or random, 
is more appropriate. 

This digest is based on an article originally appearing in Practical Assessment Research 
and Evaluation 
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